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■ Abstract. A Hamming compatible metric is an integer- valued metric on the words of a 

finite alphabet which agrees with the usual Hamming distance for words of equal length. 
' We define a new Hamming compatible metric, compute the cardinality of a sphere with 

CN| . respect to this metric, and show this metric is minimal in the class of all "well-behaved" 

£H ■ Hamming compatible metrics. 
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1. Introduction 

Ever since Richard Hamming's seminal 1950 paper [2], the notion of the Hamming dis- 
tance has played a fundamental role in the development of coding theory, error-correcting 
codes, cryptography, telecommunication, and information theory. Simply put, the Ham- 
ming distance between two words (or strings) of equal length counts the number of places 
j> ■ where the corresponding letters differ. For instance 'coma' and 'comb' have Hamming 

distance 1, while 'sunny' and 'burnt' have Hamming distance 3, and it is easy to check 
\ that this is in fact a metric on the set of words of a given length. 

However, the classical Hamming distance is restrictive in that it only measures the 
distance between two words of equal length. One would also like to have a similar metric 
for words of different lengths which agrees with the Hamming distance when those words 
are of the same length. To this end, the second author introduced [1] the notion of an 
integer-valued Hamming compatible metric and gave a natural example. 

A natural question that arises is "How small can such a metric be?" Using only the 
axioms of a metric and the fact that it is integer-valued and Hamming compatible, one 
may try to find the smallest such metric on a language. Unfortunately, these properties 
alone are not enough to say something substantive about the minimality of such metrics, 
and we will see there are some examples which do not have the desired behavior. On the 
other hand, if we make some natural uniformity assumptions on the metric, we can show 
that there is indeed a smallest such metric. 

In what follows, we fix a finite set S, called an alphabet, and denote by S n the set 
of n- letter words. The collection of all words of finite length is denoted £*, called the 
improper language. All metrics are assumed to be integer-valued. 

2. The d 2 metric 
We begin with the following characterization of Hamming distance, H. 
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Proposition 2.1. Let d : E* x E* — > Z>o be a mapping such that the induced map Suppose 
d n on E n x E n is a metric for every n G Z> . TTten d = H is the Hamming distance on 
E n for every n G Z> &/ and ora/y i/ 

(1) for all words u, v with n = l(u) = l(v) we have d(u, v) < n, and 

(2) d(uiV\,U2V2) = d(ui,U2) + d(vi,V2) whenever l{u\) = l{u2) and l(vi) = /(i^)- 

Proof. It is obvious by the definition that the Hamming distance satisfies properties (1) 
and (2). To show the converse, we induct on n = l(u) = l(v). For n — 1, u — a and v = b 
are phonemes. Property (1) implies that d(a, b) < 1 and can only be if a = b since d is 
a metric. Hence, if a ^ 6, we must have d(a,b) = 1. So d is the Hamming distance for 
phonemes, i.e. for words of length 1. 

For the inductive step let u', v' be words of length n + 1 and write u' = au, v' = bv 
where a, b are phonemes and u, v are words of length n. Using property (2), the n = 1 
case, and the inductive hypothesis we have 

d(u', v') = d(a, b) + d(u, v) = H(a, b) + H(u, v) = H(u', v'). 

Hence, d is the Hamming distance. □ 

Definition 2.2. A metric d on the language S* is called Hamming compatible if for any 
n G Z> , d(u, v) = H(u, v) for all u, v G S n . 

Let H(u,v) be the truncated Hamming function, defined as follows. If l(u) > l{v), 
then u drops the last l(u) — l(v) letters of u and v — v, so that H(u,v) = H(u, v) is the 
usual Hamming distance between two words of length l(v). Observe that the truncated 
Hamming function is not a metric! 

Example 2.3. In pQ, the second author defined T(u,v) := H(u,v) + \l(u) — l(v)\, and 
showed that it is a Hamming compatible metric. It is easily seen to be Hamming com- 
patible, and one checks the triangle inequality by exhausting the cases. 

Define the following: 

d n (u, v) = H(u, v) + 7„(m, v) 

where 

r ^MzlMI if - l(v) = (mod n) 

y \i{u)-i(v)\+n~i _ £^ ^ ^ (mod n), for some z G {0, 1, . . . , n — 1} 

for words u,v in the language S* over an alphabet E. Let iV = |E|. 

If 72, > 3, then d n is not a metric, as the following example shows. Let E = {0,1}, 
w = n , u = 2n , and v = n l n n . Here we mean that w is the n- letter word consisting 
solely of zeros, and so on. 

Then, 

d n (u,v) = n+'^ = n + l 
d n {u,w) = + z = 1 

d n (v, w) = + = 2, so we have that d n (u, v) > d n (u, w) + d n (v, w). 
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However, for n — 2, this is an integer-valued, Hamming compatible metric. In this 
case, 

d 2 (u, v) = H(u,v) + 72(11, v) 

where 

^HlzjMJ jf _ i s even 



7 2 (w,u) = 



\l(u)-l(v)\+l 



if £(u) — Z(t>) is odd 

Another way to write this is to say j 2 (u,v) = [" HMziMl "^ i. e . the least integer that is 
greater than or equal to HMziMI ; a i so called the round up, or ceiling function. 

Proposition 2.4. TTie function d 2 = H(u,v) + |"i^Mz^Mi"| ^ s a metric on E*. 

Proof. We check the triangle inequality for 5 = g^- Let u,v,w G S*, and set D = 
5(*u, io ) + 5(w, v) — 5(u, v). We must show D > in all cases. 
Case 1: l(w) < l{u) < l(v) 

Then we may write u = uiu 2 ,v = viv 2 v 3 where l(w) = l{u\) = l(vi) and l(u 2 ) = l{v 2 ). 
We also have 

5(u, v) = H( Ul , ui) + H(u 2 , v 2 ) + 

8(w,v) = H(v!,w) + l M +l M +a! > , where each a, G {0, 1}. Then we have 
D = [H( Ul , w) + H(v u w) - H( Ul , Vl )} + [l(u 2 ) - H(u 2 , v 2 )\ + - 



2 

Since H is a metric on E n for n = l(w), the first term is nonnegative. The second term is 
nonnegative, as it is just property (1) from the previous proposition. It suffices to show 
the third term L = Q2+Q 2 3 ~ ai is nonnegative. 

Subcase 1A: l(u) — l(y) is even. Then l(u) — l(w) and l{y) — l(w) are either both even 
or both odd. If both are even, then cti = a 2 = a 3 = 0, so L = > 0. If both are odd, 
then oti = 0, a 2 = 0:3 = 1 so L = 1 > 0. 

Subcase IB: l{u) — l(v) is odd. Then exactly one of l(u) — l(w) and l(v) — l(w) is odd, 
and the other is even. Hence a± — 1 and {a 2 , a 3 } = {0, 1}, so L = > 0. 

Case 2: l(u) < l(w) < l(v) 

Then we may write w = Wiw 2 ,v = Viv 2 v 3 where l(u) = l(wi) = l(vi) and l(w 2 ) = l{y 2 ). 
We also have 

5(u,v) = H(u,v 1 ) + l{v2)+l< ? 3)+ai 
5{u,w) = H{u,w l ) + l -^h2l 

8(w, v) = H(wi, vi) + H(w 2 , v 2 ) + "• t ' 3 2 + " 3 , where each cti £ {0, 1}. Then we have 

D = [H(u, Wl ) + H( Wl , Vl ) - H(u, Vl )\ + H(w 2 , v 2 ) + - + ^ - Ql . 
Again, it suffices to show the third term L = Q2+a 2 3 ~ ai is nonnegative. 
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Subcase 2 A: l(u) — l(v) is even. Then l(u) — l(w) and l(v) — l(w) are either both even 
or both odd. If both are even, then ct\ = a 2 = = 0, so L = > 0. If both are odd, 
then «i = 0, a 2 = «3 = 1 so L = 1 > 0. 

Subcase 2B: l{u) — l(y) is odd. Then exactly one of l{u) — l(w) and l{y) — l(w) is odd, 
and the other is even. Hence a\ = 1 and {a 2 , 0^3} = {0, 1}, so L = > 0. 

Case 3: l{u) < l(v) < l(w) 

Then we may write v = v\v 2 , w = W1W2W3 where l{u) = l(vi) = l{wi) and l{y2) = /(w^)- 
We also have 

5(u,v) = H(u, Vl ) + l -^i 

5(u, w) = H(u, w x ) + ' (w2)+ ^ W3)+a2 

5{w } v) = H(wi, Vi) + H(w 2 , v 2 ) + ^ W3 ) +Ct3 ( where each e {0, 1}. Then we have 

D = [H(u, Wl ) + H(w u v 2 ) - H(u, Vl )] + H(w 2 , v 2 ) + l(w 3 ) + a ^ + a ^~ a \ 

Again, it suffices to show the third term L = " 2+ ° 3 ~ ai is nonnegative. An argument 
analogous to the previous cases shows that L > 0. 

Hence, in all cases 5 = d 2 satisfies the triangle inequality (and obviously symmetry 
and reflexivity) so it is an integer-valued metric, and it is obvious that it is Hamming 
compatible. □ 

Remark 2.5. It was asked by the second author in pQ if the metric T in Example 2.3 is 
minimal in its class. That is to say, if 5 is a Hamming compatible metric, must it always 
be the case that 5 > T? As we have just seen, d 2 is such a metric that satisfies d 2 < T 
by definition, with strict inequality holding for appropriate pairs of words. However, as 
we will see in section 4, there exist Hamming compatible metrics that take values even 
smaller than d 2 . 

3. Cardinality of a sphere 

Spheres with respect to the Hamming distance are related to the concept of error 
correcting codes. Define the sphere of radius r centered at u to be S r (u) = {v G 
T**\d 2 (u,v) = r}. We also define the following sphere for words of fixed length, S 3 r {u) = 
{v G T,j\d 2 (u, v) = r}, where Sj is the set of all j-letter words in S*. 

Here, we are aiming to compute the cardinality of the sphere S r (u). 

k+2r 

Lemma 3.1. Suppose u is a word of length k. Then \S r (u)\ = \S^.(u)\. 

j=k-2r 

Proof. If v is a word of length j with d 2 (u, v) = r, then we must have k — 2r<j< k + 2r, 
else j 2 (u,v) > r. Since S r (u) is a disjoint union of these S^(u), the equality follows 
immediately. □ 

Now we compute the cardinality of S 3 r (u) for a word u of length k. Let v be a word 
of length j with d 2 (u,v) = r. Let a = 72(1^ i>), i-e. a is the smallest integer such that 
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\k — j\ < 2a, i.e. a = |~^^] is the roundup of half \k — j\. Then H(u,v) = r — a, so the 
concatenations must differ in exactly r — a places. If j < k, then there are ( £_ J (N — l) r ~ a 
such j-letter words that differ from u in exactly r — a places. If j > k, then there are 
( r ^ a )(N — iy r ~ a N j ~ k such j-letter words. Hence, we have proved 

Lemma 3.2. Suppose u is a word of length k. Then 

( ( r i a )( N -iy- a ifj<k 

\si(u)\ = 

( ( r k a )(N-iy- a w- k ifj>k 

where a = • 

4. MINIMALITY OF THE d 2 METRIC 

Here we explore lower bounds of a Hamming compatible metric 5 on a language E*. 
Recall that e is the empty word. 

Definition 4.1. We say that two words u, v G S* of the same length n = l(u) = l(v) are 
Hamming opposites if the Hamming distance between them is maximal, i.e. H (u, v) = n. 

Remark 4.2. Let £ be an alphabet. If |E| = k and u E E* with length n > 1, then u 
has exactly (k — l) n Hamming opposite(s). 

Definition 4.3. We say that 5 is weakly uniform if for any Hamming opposites u,v we 
have that 5(u, e) = 5(v, e). 

Proposition 4.4. If 5 is weakly uniform, then 5(u,e) > ^ for allu E E*. In particular, 
5(u, e) — > oo as l{u) — > oo. 

Proof. Let u be a word of length l{u) = n, and choose a Hamming opposite v. Then the 
triangle inequality gives 

n = H(u, v) = 5(u, v) < 5(u, e) + 5(v, e) = 2S(u, e), 

hence 8(u,e)>^. □ 

Remark 4.5. The above argument shows that even when 5 is not weakly uniform, given 
any word u of length n and any Hamming opposite v, then at least one of u or v must 
satisfy the above inequality, since either 5(u,e) > S(v,e) or 5(u,s) < 5(v,e). Moreover, 
it is important to note that this inequality is sharp, since our weakly uniform metric d 2 
above satisfies d 2 (u,e) = |~^r~|. 

Denote by H = H(u, v) the Hamming distance of the concatenation of u, v E E*. Given 
a Hamming compatible metric 5, set 7 = 5 — H. Thus, we may write S = H + 7 where 
H is the concatenated Hamming distance. Observe that 5 is Hamming compatible if and 
only if 7(w, v) = whenever /(«) = l(v). 

Since H(u,£_) = implies 5(u,e) = ^(u, e), requiring that 5 be weakly uniform is 
equivalent to requiring that 7(tx, e) = j(v, e) for all Hamming opposites u, v. 



6 PARSA BAKHTARY, OTHMAN ECHI 

Definition 4.6. Given 5 = H + 7 as above, we say that 5 is uniform if given any pair of 
Hamming opposites u, v G E n , we have 7(1/, w) = j(v, w) for all w G £*. 

Example 4.7. T(w, i>) := H(u,v) + |Z(tt) - Z(v) | , so here 7 = \l(u) - 

Example 4.8. d 2 (u,v) = H(u,v) + so here 7 = [HMziM]. 

Whereas weak uniformity is the simple notion that Hamming opposites are equidistant 
from the empty word e, this definition of uniformity is slightly more mysterious. If we fix 
w = e this condition reduces to weak uniformity. Notice both examples above are uniform, 
and more generally, if "f(u,w) depends only on the lengths l(u),l(w) of the inputs u,w, 
then 6 is uniform. 

Lemma 4.9. Suppose we are given words u,w G £* with l(u) > l(w). Then there exists 
a Hamming opposite v for u such that 

H(u,w) + H(v,w) = l{w). 

Proof. Write u = UiU 2 . . . u n and w = WiW 2 ■ ■ ■ w m , where m < n by assumption. Then let 
k = H(u, w) = H(u', w), where v! = u\u 2 ■ ■ ■ u m . Now choose Vi by the following method. 
For 1 < i < m, set Vi — Wi if iOj 7^ u^. If iOj = Ui, then take to be any letter in E 
different from Uj. For m < i < n, just take Vi to be any letter in E different from -Uj. 

Now, it is clear by the construction that v — V\V 2 ■ ■ ■ v n is a Hamming opposite for u 
since they differ in each letter place. Let v' = v\v 2 . . .v m , so that H(v,w) = H(v',w). 
Since v! and w differ in k places, and v' agrees with w in precisely those k places and 
differs elsewhere, H(v', w) = m — k. Hence, we have 

H(u, w) + H(v', w) — k + (m — k) — m — l(w). 

□ 

The following theorem is the main result. 

Theorem 4.10. Suppose S is a uniform Hamming compatible metric on a language E*. 
Then 5(u, v) > d 2 (u, v) for all u, v G E*. In particular, S(u,v) — > 00 as \l(u) — l(v)\ —> 00. 

Proof. We must show that for all u, w G £*, S(u, w) > d 2 (u, w). Writing 8 — H + 7, this 
is equivalent to showing that j(u, w) > [" HMz!Wi "| _ Assume without loss of generality 
that l(u) > l(w), and choose a Hamming opposite v for u that satisfies the equality of the 
above lemma. Now, we have 

l(u) = H(u, v) = S(u, v) < 5(u, w) + S(v,w) — H(u,w) + j(u, w) + H(v, w) + "f(v, w). 

By our choice of v, H(u,w) + H(v,w) = l(w), so 

l{u) < l(w) + "f(u, w) + j(v , w) = l(w) + 2^{u, w) 

by the uniformity assumption. Hence, ^{u.w) > Because 7 is integer valued, it 

follows that 7(14, w) > |"HMz!Mi"| ) which completes the proof. □ 
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Remark 4.11. We remark that given any u,w G S* and a Hamming opposite v for u that 
satisfies the equality of the above lemma, either 8(u,w) > d 2 (u,w) or 8(v,w) > d 2 (v,w) 
must happen, since either j(u, w) > ^y(v,w) or j(u,w) < ^y(v,w), without any uniformity 
assumption on 8. This means that even the wildest Hamming compatible distances must 
grow to some extent with \l(u) — l(v)\. To make this slightly more rigorous, given any word 
w G S*, we can always find a sequence of words u n G S n such that 8(u n , w) > rf 2 («„, w), 
and in particular 8(u n , w) — > oo as n — > oo. The uniformity assumption simply ensures 
that this growth is literally uniform. 

Example 4.12. Uniformity is necessary for the above minimality result to hold. Let 
£ = {0,1} and define 

if u — e 
8(u, e) = I 1 if u = 000 
a? 2 («,000) otherwise 

where 0I2 is the metric defined previously. Then take S(u,v) = dz{u,v) for all u,v G 
S* — {e}. We first show 5 is a metric. Since we have only changed distances to e, and 
di has been shown to be a metric, we only need to check the triangle inequality for 
expressions involving e. First consider 

S(u : e) < 8(u, v) + 6(v, e). 

If u = e, 000 then this is trivial. If u ^ e, 000 then this becomes 

d 2 (u, 000) < S(u,v) + 5(v,e). 

If v — e, 000 then again this is trivial, so assuming u^e, 000 this becomes 

d 2 (u, 000) < d 2 (u, v) + d 2 {v, 000) 

which is true because d 2 is a metric. Now consider the expression 

5(u, v) < 5(u, e) + 6(v, e). 

Again, if u — e, 000 then this is trivial, and by symmetry the same is true if v — e, 000. 
If neither u nor v are these words, then this becomes 

d 2 {u, v) < d 2 {u, 000) + d 2 (v, 000) 

which is again true because d 2 is a metric. Hence, 5 is a Hamming compatible integer 
valued metric. 

It is easy to see that 8 is not even weakly uniform, since 8(0, e) = ^(0,000) = 1 and 
8(1, e) = d 2 (l,000) = 2. Morever, by construction 5(000, e) = 1 < 2 = [§] = d 2 (000,e), 
so it takes values smaller than d 2 . 
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Example 4.13. The following variant of the above example shows that there exist weakly 
uniform metrics which are not uniform. As before, let £ = {0, 1} and define 

( if u = 



S(u,0) 



if u — e, 1, 11 



T(u, 11) otherwise 

where T is the metric defined previously. Then take 5(u, v) = T(u, v) for all u, v G 
S* — {0}. We first show 5 is a metric. Again, since we have only changed distances to 0, 
we have not changed the distance from e to or from 1 to 0, and T is a metric, we only 
need to check the triangle inequality for expressions involving 0. First consider 

5(u,0) < 5(u,v) + S(v,0). 

If u — e, 0, 1, 11 then this is trivial. If u ^ e, 0, 1, 11 and then this becomes 

T(u,ll) < 5(u,v) + 5(v,0). 

Now we may assume l(u) > 2. If v — e this is T(u, 11) < l(u) + 1, which is true since 
T(u, 11) = H(u, 11) + l(u) - 2 < Z(u). If v = this is T(m, 11) < S(u, 0) + = T(u, 11). 
If v — 1 this is 11) < 1) + 1. This holds, because the left hand side is 

H(u, 11) + l(u) - 2 < H(u, 1) + i(u) -1 + 1 

the right hand side. If v — 11 this is T(u, 11) < S(u, 11) + 1 = T(u, 11) + 1. Now, assuming 
v ^ e, 0, 1, 11 this becomes 

T(w,ll) < T(u,v) + T(v, 11) 

which is true because T is a metric. 
Now consider the expression 

5(u,v) < 5{u,0) + S(v,0). 
If u = this is trivial. If u = 1, this becomes 

6(1, v) < l + 8(v,0). 

If v = e, 0, 1 this is trivial, and if v = 11 this is 1 = T(l, 11) < 1 + 1. If v ^ e, 0, 1, 11 
then l{v) > 2 and this becomes T(l, f) < 1 + T(v, 11). This is 

H(v, 1) + l(v) - 1 < 1 + H(v, 11) + l(v) - 2 

which holds because H(v,l) < H(v,H) when v ^ 11. Now suppose it = e. Then our 
inequality becomes 5(e,v) < 1 + 5(v,0). If v — e, 0,1,11 this is obvious, so assume 
v ^ e, 0, 1, 11, so l(v) > 2, and this is l(v) < 1 + T(v, 11) = 1 + H(v, 11) + - 2. But 
this is just saying that H(v,ll) > 1, which is true since u ^ 11 and /(f) > 2. 
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If u = 11 then our inequality becomes 6(11, v) < 1 + 6(v,0). If v — e, 0, 1, 11 this is 
clear, and if v ^ e, 0, 1, 11 this is just saying that T(ll, v) < 1 + T(v, 11) which is trivial. 
The final case is when u,i;^e,0 } l,ll, in which case our inequality becomes 

T(u,v) < T(u, 11) +T(v, 11) 

which is again true because T is a metric. Hence, 6 is a Hamming compatible integer 
valued metric. 

Now, 6 is weakly uniform because 6(0, e) = 6(1, e) = 1, and for l(u) > 1, we have 
6(u, e) = T(u,e) = T(v,e) = 6(v,e) for all u,v G S n . However, 6 is not uniform because 
7(11,0) = 5(11,0) -H (11,0) = 1-1 = 0, while 7(00,0) = <5(00, 0) -#(00, 0) = 2-0 = 2. 

5. Non-uniform metrics 

These examples illustrate that arbitrary Hamming compatible metrics may behave 
wildly in general, so some uniformity condition is necessary in order to say something 
about minimality. Furthermore, unless there are distinguished words, one desires such 
uniformity for this type of metric. 

However, given an arbitrary Hamming compatible metric 6, one may naively ask the 
question, how many words of a given length must satisfy the inequality of Theorem 14. 101 / 
Suppose we have words u G £„, w G S m with n > m that violate the inequality of the 
theorem, i.e. 6(u,w) < d 2 (u,w). In view of Remark 14.111 any Hamming opposite v of u 
chosen by the method of Lemma [4.91 must then satisfy 6(v,w) > d 2 (v,w). 

Let h := H(u,w) and N = |E| as before. There are (N — l) n total Hamming opposites 
of u, and of these there are (N — l) m ~ h (N — l) n ~ m = (jV — i) n ~ h Hamming opposites 
that satisfy the equality of Lemma 14. 9[ and hence the inequality of Theorem 14.101 For 
instance, if w — e, then h = and so every Hamming opposite v of u must satisfy 
6(v, e) > d 2 (v, e) = [|] , which recovers the first part of Remark 14.51 

In general, the probability of a Hamming opposite satisfying the inequality of the 
theorem is at least ^j^J^n = — (N~i) m ■ ^he exceptional case is when E = {0,1}, 

i.e. N = 2, because in this setting Hamming opposites are unique. Here, if 6(u,w) < 
d 2 (u, w), then we must have 6(v , w) > d 2 (v, w) for the unique Hamming opposite v of u, 
so at least half of the words of a given length must satisfy the inequality of the theorem. 
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